AITopics | retain dataset

Collaborating Authors

retain dataset

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

BLUR: A Bi-Level Optimization Approach for LLM Unlearning

Reisizadeh, Hadi, Jia, Jinghan, Bu, Zhiqi, Vinzamuri, Bhanukiran, Ramakrishna, Anil, Chang, Kai-Wei, Cevher, Volkan, Liu, Sijia, Hong, Mingyi

arXiv.org Artificial IntelligenceOct-21-2025

Enabling large language models (LLMs) to unlearn knowledge and capabilities acquired during training has proven vital for ensuring compliance with data regulations and promoting ethical practices in generative AI. Although there are growing interests in developing various unlearning algorithms, it remains unclear how to best formulate the unlearning problem. The most popular formulation uses a weighted sum of forget and retain loss, but it often leads to performance degradation due to the inherent trade-off between forget and retain losses. In this work, we argue that it is important to model the hierarchical structure of the unlearning problem, where the forget problem (which \textit{unlearns} certain knowledge and/or capabilities) takes priority over the retain problem (which preserves model utility). This hierarchical structure naturally leads to a bi-level optimization formulation where the lower-level objective focuses on minimizing the forget loss, while the upper-level objective aims to maintain the model's utility. Based on this new formulation, we propose a novel algorithm, termed Bi-Level UnleaRning (\texttt{BLUR}), which not only possesses strong theoretical guarantees but more importantly, delivers superior performance. In particular, our extensive experiments demonstrate that \texttt{BLUR} consistently outperforms all the state-of-the-art algorithms across various unlearning tasks, models, and metrics. Codes are available at https://github.com/OptimAI-Lab/BLURLLMUnlearning.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2506.08164

Country: North America > United States (0.28)

Genre: Research Report (0.82)

Industry:

Law (1.00)
Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.67)

Add feedback

Reliable Unlearning Harmful Information in LLMs with Metamorphosis Representation Projection

Wu, Chengcan, Wei, Zeming, Chen, Huanran, Dong, Yinpeng, Sun, Meng

arXiv.org Artificial IntelligenceAug-22-2025

While Large Language Models (LLMs) have demonstrated impressive performance in various domains and tasks, concerns about their safety are becoming increasingly severe. In particular, since models may store unsafe knowledge internally, machine unlearning has emerged as a representative paradigm to ensure model safety. Existing approaches employ various training techniques, such as gradient ascent and negative preference optimization, in attempts to eliminate the influence of undesired data on target models. However, these methods merely suppress the activation of undesired data through parametric training without completely eradicating its informational traces within the model. This fundamental limitation makes it difficult to achieve effective continuous unlearning, rendering these methods vulnerable to relearning attacks. To overcome these challenges, we propose a Metamorphosis Representation Projection (MRP) approach that pioneers the application of irreversible projection properties to machine unlearning. By implementing projective transformations in the hidden state space of specific network layers, our method effectively eliminates harmful information while preserving useful knowledge. Experimental results demonstrate that our approach enables effective continuous unlearning and successfully defends against relearning attacks, achieving state-of-the-art performance in unlearning effectiveness while preserving natural performance. Our code is available in https://github.com/ChengcanWu/MRP.

large language model, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2508.15449

Country: North America > United States (0.46)

Genre: Research Report > New Finding (1.00)

Industry:

Law (1.00)
Information Technology > Security & Privacy (1.00)
Government (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Cyber for AI at SemEval-2025 Task 4: Forgotten but Not Lost: The Balancing Act of Selective Unlearning in Large Language Models

P, Dinesh Srivasthav, Garlapati, Bala Mallikarjunarao

arXiv.org Artificial IntelligenceMar-2-2025

Large Language Models (LLMs) face significant challenges in maintaining privacy, ethics, and compliance, when sensitive or obsolete data must be selectively removed. Retraining these models from scratch is computationally infeasible, necessitating efficient alternatives. As part of the SemEval 2025 Task 4, this work focuses on the application of selective unlearning in LLMs to address this challenge. In this paper, we present our experiments and findings, primarily leveraging global weight modification to achieve an equilibrium between effectiveness of unlearning, knowledge retention, and target model's post-unlearning utility. We also detail the task-specific evaluation mechanism, results, and challenges. Our algorithms have achieved an aggregate score of 0.409 and 0.389 on the test set for 7B and 1B target models, respectively, demonstrating promising results in verifiable LLM unlearning.

dataset, language model, target model, (15 more...)

arXiv.org Artificial Intelligence

2503.04795

Country:

North America > United States > Florida > Miami-Dade County > Miami (0.04)
Asia > Singapore (0.04)
North America > United States > Virginia (0.04)
(2 more...)

Genre: Research Report (0.51)

Industry: Information Technology > Security & Privacy (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.68)

Add feedback

Do Unlearning Methods Remove Information from Language Model Weights?

Deeb, Aghyad, Roger, Fabien

arXiv.org Artificial IntelligenceNov-10-2024

Large Language Models' knowledge of how to perform cyber-security attacks, create bioweapons, and manipulate humans poses risks of misuse. Previous work has proposed methods to unlearn this knowledge. Historically, it has been unclear whether unlearning techniques are removing information from the model weights or just making it harder to access. To disentangle these two objectives, we propose an adversarial evaluation method to test for the removal of information from model weights: we give an attacker access to some facts that were supposed to be removed, and using those, the attacker tries to recover other facts from the same distribution that cannot be guessed from the accessible facts. We show that using fine-tuning on the accessible facts can recover 88% of the pre-unlearning accuracy when applied to current unlearning methods, revealing the limitations of these methods in removing information from the model weights. During pretraining, Large Language Models (LLMs) acquire many capabilities, both intended and unintended (Wei et al., 2022). These capabilities have raised concerns about LLMs acquiring dangerous capabilities that can be exploited by malicious actors, such as assisting in cyber-attacks or creating bioweapons (Fang et al., 2024). Acknowledging these threats, the Executive Order on Artificial Intelligence (White House, 2023) has emphasized the importance of responsible development of AI models. To address these concerns, LLMs are typically trained to refuse to engage in dangerous activities. Refusal is vulnerable to jailbreak techniques (Wei et al., 2023; Zou et al., 2023; Liu et al., 2024b) and other attacks. Figure 1: Our approach to evaluate unlearning: we try to recover potentially hidden facts by retraining on facts independent of the facts used for evaluation but coming from the same distribution (left). Using this procedure, we find that we are able to recover a large fraction of performance when using state-of-the-art unlearning methods like RMU (Li et al., 2024b) (right).

accuracy, dataset, information, (16 more...)

arXiv.org Artificial Intelligence

2410.08827

Country:

North America > United States (0.34)
Europe > Spain (0.05)

Genre: Research Report > New Finding (0.46)

Industry:

Information Technology > Security & Privacy (1.00)
Government > Military > Cyberwarfare (0.48)
Government > Regional Government > North America Government > United States Government (0.34)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.95)

Add feedback

Does your LLM truly unlearn? An embarrassingly simple approach to recover unlearned knowledge

Zhang, Zhiwei, Wang, Fali, Li, Xiaomin, Wu, Zongyu, Tang, Xianfeng, Liu, Hui, He, Qi, Yin, Wenpeng, Wang, Suhang

arXiv.org Artificial IntelligenceOct-21-2024

Large language models (LLMs) have shown remarkable proficiency in generating text, benefiting from extensive training on vast textual corpora. Machine unlearning has been introduced as a viable solution to remove the influence of such problematic content without the need for costly and time-consuming retraining. This process aims to erase specific knowledge from LLMs while preserving as much model utility as possible. Despite the effectiveness of current unlearning methods, little attention has been given to whether existing unlearning methods for LLMs truly achieve forgetting or merely hide the knowledge, which current unlearning benchmarks fail to detect. This paper reveals that applying quantization to models that have undergone unlearning can restore the "forgotten" information. We conduct comprehensive experiments using various quantization techniques across multiple precision levels to thoroughly evaluate this phenomenon. We find that for unlearning methods with utility constraints, the unlearned model retains an average of 21% of the intended forgotten knowledge in full precision, which significantly increases to 83% after 4-bit quantization. Based on our empirical findings, we provide a theoretical explanation for the observed phenomenon and propose a quantization-robust unlearning strategy aimed at mitigating this intricate issue. Our results highlight a fundamental tension between preserving the utility of the unlearned model and preventing knowledge recovery through quantization, emphasizing the challenge of balancing these two objectives. Altogether, our study underscores a major failure in existing unlearning methods for LLMs, strongly advocating for more comprehensive and robust strategies to ensure authentic unlearning without compromising model utility. Large language models (LLMs) have exhibited remarkable abilities in generating human-like text, owing to their training on extensive datasets (Zhao et al., 2023). However, LLMs can also unintentionally learn and reproduce undesirable behaviors from sensitive training data (Liu et al., 2024a; Sun et al., 2024). Furthermore, laws such as the European Union General Data Protection Regulation (GDPR) (Voigt & Von dem Bussche, 2017) have introduced the "Right to be Forgotten", allowing users to request the removal of their personal data from trained models (Xu et al., 2024a). FP32 "There's more in the frying pan," Petunia, turning eyes on said Aunt her massive son.

large language model, natural language, quantization, (15 more...)

arXiv.org Artificial Intelligence

2410.16454

Country:

North America > United States > Virginia (0.04)
North America > United States > Pennsylvania (0.04)
Europe > Spain > Catalonia > Barcelona Province > Barcelona (0.04)
Asia > China > Tianjin Province > Tianjin (0.04)

Genre: Research Report > New Finding (0.87)

Industry:

Law (1.00)
Information Technology > Security & Privacy (1.00)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

LLM Surgery: Efficient Knowledge Unlearning and Editing in Large Language Models

Veldanda, Akshaj Kumar, Zhang, Shi-Xiong, Das, Anirban, Chakraborty, Supriyo, Rawls, Stephen, Sahu, Sambit, Naphade, Milind

arXiv.org Artificial IntelligenceSep-19-2024

Large language models (LLMs) have revolutionized various domains, yet their utility comes with significant challenges related to outdated or problematic knowledge embedded during pretraining. This paper addresses the challenge of modifying LLMs to unlearn problematic and outdated information while efficiently integrating new knowledge without retraining from scratch. Here, we propose LLM Surgery, a framework to efficiently modify LLM behaviour by optimizing a three component objective function that: (1) Performs reverse gradient on unlearning dataset (problematic and outdated information), (2) Performs gradient descent on the update dataset (new and updated information), and (3) Minimizes the KL divergence on the retain dataset (small subset of unchanged text), ensuring alignment between pretrained and modified model outputs. Due to the lack of publicly available datasets specifically tailored for our novel task, we compiled a new dataset and an evaluation benchmark. Using Llama2-7B, we demonstrate that LLM Surgery can achieve significant forgetting on the unlearn set, a 20\% increase in accuracy on the update set, and maintain performance on the retain set.

arxiv preprint arxiv, dataset, llm surgery, (12 more...)

arXiv.org Artificial Intelligence

2409.13054

Country:

North America > United States > Virginia (0.04)
North America > United States > New York (0.04)

Genre: Research Report (0.40)

Industry: Law (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.71)

Add feedback

Machine Unlearning using a Multi-GAN based Model

Hatua, Amartya, Nguyen, Trung T., Sung, Andrew H.

arXiv.org Artificial IntelligenceJul-25-2024

This article presents a new machine unlearning approach that utilizes multiple Generative Adversarial Network (GAN) based models. The proposed method comprises two phases: i) data reorganization in which synthetic data using the GAN model is introduced with inverted class labels of the forget datasets, and ii) fine-tuning the pre-trained model. The GAN models consist of two pairs of generators and discriminators. The generator discriminator pairs generate synthetic data for the retain and forget datasets. Then, a pre-trained model is utilized to get the class labels of the synthetic datasets. The class labels of synthetic and original forget datasets are inverted. Finally, all combined datasets are used to fine-tune the pre-trained model to get the unlearned model. We have performed the experiments on the CIFAR-10 dataset and tested the unlearned models using Membership Inference Attacks (MIA). The inverted class labels procedure and synthetically generated data help to acquire valuable information that enables the model to outperform state-of-the-art models and other standard unlearning classifiers.

dataset, forget dataset, retain dataset, (15 more...)

arXiv.org Artificial Intelligence

2407.18467

Country:

North America > Canada (0.15)
North America > United States > California (0.14)
North America > United States > Mississippi > Forrest County > Hattiesburg (0.14)
(8 more...)

Genre: Research Report > New Finding (0.46)

Industry:

Information Technology > Security & Privacy (1.00)
Government (0.94)
Law (0.70)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.94)

Add feedback

Dissecting Language Models: Machine Unlearning via Selective Pruning

Pochinkov, Nicholas, Schoots, Nandi

arXiv.org Artificial IntelligenceMar-2-2024

Understanding and shaping the behaviour of Large Language Models (LLMs) is increasingly important as applications become more powerful and more frequently adopted. This paper introduces a machine unlearning method specifically designed for LLMs. We introduce a selective pruning method for LLMs that removes neurons based on their relative importance on a targeted capability compared to overall network performance. This approach is a compute- and data-efficient method for identifying and removing neurons that enable specific behaviours. Our findings reveal that both feed-forward and attention neurons in LLMs are specialized; that is, for specific tasks, certain neurons are more crucial than others.

dataset, neuron, pruning, (17 more...)

arXiv.org Artificial Intelligence

2403.01267

Country:

North America > United States > California > San Francisco County > San Francisco (0.14)
North America > Dominican Republic (0.04)
South America > Colombia > Meta Department > Villavicencio (0.04)
(10 more...)

Genre: Research Report > New Finding (0.66)

Industry: Information Technology > Security & Privacy (0.67)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.94)

Add feedback

Dataset Condensation Driven Machine Unlearning

Khan, Junaid Iqbal

arXiv.org Artificial IntelligenceJan-31-2024

The current trend in data regulation requirements and privacy-preserving machine learning has emphasized the importance of machine unlearning. The naive approach to unlearning training data by retraining over the complement of the forget samples is susceptible to computational challenges. These challenges have been effectively addressed through a collection of techniques falling under the umbrella of machine unlearning. However, there still exists a lack of sufficiency in handling persistent computational challenges in harmony with the utility and privacy of unlearned model. We attribute this to the lack of work on improving the computational complexity of approximate unlearning from the perspective of the training dataset. In this paper, we aim to fill this gap by introducing dataset condensation as an essential component of machine unlearning in the context of image classification. To achieve this goal, we propose new dataset condensation techniques and an innovative unlearning scheme that strikes a balance between machine unlearning privacy, utility, and efficiency. Furthermore, we present a novel and effective approach to instrumenting machine unlearning and propose its application in defending against membership inference and model inversion attacks. Additionally, we explore a new application of our approach, which involves removing data from `condensed model', which can be employed to quickly train any arbitrary model without being influenced by unlearning samples.

condensation, dataset, retain dataset, (17 more...)

arXiv.org Artificial Intelligence

2402.00195

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Asia > South Korea (0.04)

Genre: Research Report (0.64)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback